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ABSTRACT 

Motivated by applications in grid computing and project 
management, we study multiprocessor scheduling in scenar- 
ios where there is uncertainty in the successful execution 
of jobs when assigned to processors. We consider the prob- 
lem of multiprocessor scheduling under uncertainty, in which 
we are given n unit-time jobs and m machines, a directed 
acyclic graph C giving the dependencies among the jobs, 
and for every job j and machine i, the probability Pij of the 
successful completion of job j when scheduled on machine i 
in any given particular step. The goal of the problem is to 
find a schedule that minimizes the expected makespan, that 
is, the expected completion time of all the jobs. 

The problem of multiprocessor scheduling under uncer- 
tainty was introduced by Malewicz and was shown to be 
NP-hard even when all the jobs are independent. In this pa- 
per, we present polynomial-time approximation algorithms 
for the problem, for special cases of the dag C. We obtain 
an 0(logn)-approximation for the case of independent jobs, 
an 0(log m log n log(n + m)/ log log(n -I- m))-approximation 
when C is a collection of disjoint chains, an 0(logmlog^ n)- 
approximation when C is a collection of directed out- or 
in-trees, and an 0(log m log^ n log(n + m)/ log log(n + m))- 
approximation when C is a directed forest. 

Categories and Subject Descriptors 

F.2 [Theory of Computation]: Analysis of Algorithms 

General Terms 
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1. INTRODUCTION 

We study the problem of multiprocessor scheduling under 
uncertainty, which was introduced in [21J to study scenarios 
where there is uncertainty in the successful completion of a 
job when assigned to a server. One motivating application 
is in grid computing, where a large collection of computers, 
often geographically distributed, cooperate to solve complex 
computational tasks. To make better use of the distributed 
computers, a task is usually divided into smaller pieces (or 
jobs) and handed to different computers. For many applica- 
tions, there could be non-trivial dependencies among these 
jobs. Due to the possible physical failures, or simply the dis- 
tributed nature of the computing environment, a machine 
may not successfully execute the assigned job on time. In 
this scenario, a natural goal is to determine a schedule of as- 
signing the given jobs to the computers so that the expected 
completion time of the task is minimized. 

A similar example, also discussed in [21j, arises while man- 
aging a large project in an organization. The project may 
be broken down into small jobs with dependencies among 
them, i.e., a job may be executed only after the successful 
completion of another set of jobs. A group of workers are 
assigned to this project. Due to practical reasons and dif- 
ferent skills, a worker may not be able to finish an assigned 
job successfully on time. To decrease the chance of the po- 
tential delay of some key jobs, the project manager could 
(and would want to) assign several workers to these jobs at 
the same time. Based on past experiences and the workers' 
skill levels, the project manager can estimate the successful 
probability of any particular worker finishing any particular 
job. The challenge for the manager is to work out a strategy 
(or schedule) of assigning the workers to the jobs so that the 
expected completion time of the whole project is as small as 
possible. 

Motivated by the examples above, we study the problem 
of multiprocessor scheduling under uncertainty, henceforth 
referred to as SUU. We have a set of m machines, a set of 
n unit-time jobs, and a directed acyclic graph representing 
precedence constraints on the order of the execution of the 
jobs. We are also given, for every job j and machine i, the 
probability pij of the successful completion of job j when 
scheduled on machine i in any given particular step. To 
compensate for this uncertainty, multiple machines can be 
assigned to one job at the same time. We focus on the 
problem of computing a schedule to minimize the expected 
time to complete all the jobs, i.e., the expected makespan. 

1.1 Our results 



The multiprocessor scheduling problem SUU is shown to 
be NP-hard in |21) even when all jobs are independent. In 
this paper, we present approximation algorithms for SUU, 
for several special classes of dependency graphs. 

• We first consider the case when all the jobs are inde- 
pendent and present an 0(log 7i)-approximation algo- 
rithm for the problem (^S]). 

A crucial component of our approach to the independent 
jobs case is the formulation of a sub-problem in which we 
aim to maximize the sum of success probabilities for the 
jobs. A similar strategy, refined to handle job dependencies, 
allows us to attack the more general case where the jobs are 
not independent. 

• When the precedence constraints on the jobs form a 
collection of disjoint chains, we obtain an 

0(log m log n io'g°f4"( ) approximation algorithm in 
( §4.ip . Our results rely on solving a (relaxed) linear 
program and rounding the fractional solution using re- 
sults from network flow theory. 

• Using the algorithm for disjoint chains and the chain 
decomposition techniques of [17], we obtain 

0(log mlog^ n) and 0(log mlog^ "• io'gTog"(nri) ) approx- 
imations for a collection of in- or out-trees and directed 
forests, respectively f §4.2ll . 

The schedules computed by the algorithms for disjoint chains, 
trees, and directed forests, are all oblivious in the sense that 
they specify in advance the assignment of machines to jobs 
in each time step, independent of the set of unfinished jobs 
at that step. Oblivious schedules are formally defined in 
^ where we also present useful definitions and important 
properties of schedules that are used in our main results. 

To the best of our knowledge, our results are the first ap- 
proximation algorithms for multiprocessor scheduling under 
uncertainty problems. 

1.2 Related work 

The problem studied in our work was first defined in the 
recent work by Malewicz [2l| . largely motivated by the ap- 
plication of scheduling complex dags in grid computing [9J. 
Malewicz characterizes the complexity of the problem in 
terms of the number of the machines and the width of the 
dependency graph, which is defined as the maximum num- 
ber of independent jobs. He shows that when the number 
of machines and the width are both constants, the optimal 
regimen can be computed in polynomial time using dynamic 
programming. However, if either parameter is unbounded, 
the problem is NP-hard. Also, the problem can not be ap- 
proximated within a factor of 5/4 unless P=NP. Our work 
extends that of Malewicz by studying the approximability 
of the problem when neither the width of the dag nor the 
number of machines is bounded. 

The uncertainty of the scheduling problem we study comes 
from the possible failure by a machine assigned to a job, as 
modeled by the pij 's. There have been different models of 
uncertainty in the scheduling literature. Most notable is the 
model where each task has a duration of random length and 
may require different amount of resources. For related work, 
see [ZlllIllESlIlllin]. 

Scheduling in general has a rich history and a vast litera- 
ture. There are many variants of scheduling problems, de- 
pending on various factors. For example: Are the machines 



related? Is the execution preemptive? Are there precedence 
constraints on the execution of the jobs? Are there release 
dates associated with the jobs? What is the objective func- 
tion: makespan, weighted completion time, weighted flow 
time, etc.? See [l3] for a survey and [12, 20, 28, 19, 4] [17] 
for representative work. 

Two particular variants of scheduling closely related to 
our work is job shop scheduling [27] and the scheduling of 
unrelated machines under precendence constraints. In the 
job shop scheduling problem, we are given m machines and 
n jobs, each job consisting of a sequence of operations. Each 
operation must be processed on a specified machine. A job 
is executed by processing its operations according to the as- 
sociated sequence. At most one job can be scheduled on any 
machine at any time. The goal of the job shop scheduling 
problem is to find a schedule of the jobs on the machines 
that minimizes the maximum completion time. This prob- 
lem is strongly NP-hard and widely studied ll()l[l8][l]. Also 
extensively studied is the problem of preemptively schedul- 
ing jobs with precedence constraints on unrelated parallel 
machines [191 1271 117| . the processing time of a job depends 
on the machine to which it is assigned. One common char- 
acteristic of this problem and SUU is that in each problem, 
the capability of a machine i to complete a job j may vary 
with both i and j. However, while the unrelated parallel 
machines problem models this nonuniformity using deter- 
ministic processing times that vary with i and j, in SUU the 
jobs are all unit-size but may fail to complete with probabil- 
ities that vary with i and j. Owing to the uncertainty in the 
completion of jobs, SUU schedules appear to be more diffi- 
cult to specify and analyze. One other technical difference 
is that in SUU we allow multiple machines to be assigned 
to the same job at the same time, for the purpose of rais- 
ing the probability of successfully completing the job. The 
unrelated parallel machines problem is typically solved by a 
reduction to instances of the job shop scheduling problem. 
Some of our SUU algorithms also include similar reductions. 

2. SCHEDULES, SUCCESS 

PROBABILITIES, AND MASS 

In this section, we present formal definitions of a schedule 
fS I2.1ll . introduce the notion of the mass of a job and prove 
a key technical theorem about the accumulation of mass 
of a job within the expected makespan of a given schedule 
(§E2l). 

2.1 Schedules 

In SUU, we are given a set J of n unit-step jobs, and a set 
M of m machines. There are precedence constraints among 
the jobs, which form a directed acyclic graph (dag) C. A job 
j is eligible for execution at step t if all the jobs preceding j 
according to the precedence constraints have been success- 
fully completed before t. For every job j and machine i, we 
are also given Pij, which is the probability that job j when 
scheduled on a machine i will be successfully completed, in- 
dependent of the outcome of any other execution. Multiple 
machines can be assigned to the same job at the same step. 
Without loss of generality, we assume that for each j, there 
exists a machine i such that pij > 0. 

Definition 2.1. A schedule E o/length T G Z+U{oo} 
is a collection of functions {fs,t : M Ju{_L} | S C J, 1 < 
t < T + 1}. An execution of the schedule E means that, 



at the start of each step t, if S is the set of unfinished jobs: 
machine i is assigned to job fs.t{i) if fs,t{i) is eligible and 
belongs to S ; otherwise, i is idle for that step. 

Our formal definition of a schedule specifies assignment 
functions fs,t for infinite t. This is because there is a posi- 
tive probability for a job j to be not completed yet by any 
given step if Vi,j3ij < 1. For the purposes of optimizing ex- 
pected makespan, however, we can restrict our attention to 
a restricted class of schedules. 

Definition 2.2 (|21jl. A regimen Eg is a schedule 
in which /s,ti(-) = /s,t2(') for any 5 C J and ti ^ t2. 
In other words, the assignment functions fs,t 's depend only 
on the unfinished job set S. Thus, we can specify Eg by a 
complete collection of functions {fs '■ M S'U{_L} | C J}. 

We denote the minimum expected makespan for a given 
SUU instance by T'~'^^ , which is finite because for any job 
j, there exists a machine i, such that Pij > 0. It is not hard 
to see that there exists an optimal schedule which is a reg- 
imen because at any step t, one can determine an optimal 
assignment function, which only depends on the subset of 
unfinished jobs at step t and is independent of the past exe- 
cution history or the value t. While a naive specification of 
an arbitrary regimen uses 2" different assignment functions, 
certain regimens can be specified succinctly, for instance, 
by a polynomial-length function that takes S as input and 
returns fs- In this paper, we also consider a different re- 
stricted class of schedules, called oblivious schedules. 

Definition 2.3. An oblivious schedule is a schedule 
in which every assignment function fs,t is independent of S , 
i.e., for all t, S, S' , fs,t{-) = fs',t{')- Hence, the assignment 
functions at any step t can be specified by a single function, 
which we denote by ft . 

Oblivious schedules are appealing for two reasons. First, 
at any step t, only one assignment function is needed, re- 
gardless of the actual unfinished job set S occurring at step 
t. Recall that there could be many different such S at a 
given t because of the execution uncertainty. The second 
benefit is more technical: oblivious schedules allow us to 
address the uncertainty in the SUU problem by solving re- 
lated deterministic optimization problems. 

2.2 Success probabilities and mass 

When a subset of machines S C M is assigned to j in any 
time step, the probability that j is successfully completed is 
1 — riigs^^ ~Pij)- For ease of approximation, the following 
Proposition is useful to us. 

Proposition 2.1. Given xi,--- ,xk G [0,1], 1 - (1 - 
xi) ■ ■ ■ (1— Xfc) < xi + - ■ ■+Xk. Furthermore, ifxi + - ■ ■+Xk < 
1, then 1 ~ (1 — xi) ■ ■ • (1 — Xk) > e^^{xi + • • ■ + Xk). 

Proof. The first assertion follows from the identity (1 — 
xi) • • ■ (1 — Xk) > 1 — (xi + ■ • ■ + Xk), which can be proved 
using a simple induction argument. The base case of fc = 1 
is trivial. Suppose the identity holds for k — I. If ii + 
■ ■ ■ + Xk-i > 1, then the identity holds for k; Otherwise, 
according to the induction hypothesis, 

(1 - Xl) ■ • • (1 - Xk-l){l - Xk) 

> [l-(x-i + ---+a::fe_i)](l-Xfc) 

> 1-(X1 + ---+Xk). 



For the second assertion, notice that ifO<a;<l, 1 — x < 
g-^ < 1 _ |. Since 1 - x < e""', (1 - xi) • ■ ■ (1 - Xk) < 
e-==i . . . e""''', we have 

l-(l-a;i)---(l-a;fc) 
> 1 _e— 1 ...e-"*" 

_ _ g-(^iH ^^fc) 

^ Xl-\ \- Xk 

~ e 

where the last inequality follows because < 1 — f for 
X e [0, 1] and the assumption that Xi + ■ ■ ■ + Xk < 1. □ 

Proposition l2.1l suggests that we can approximate the suc- 
cess probability with a convenient linear form. 

Definition 2.4. For any schedule E, we define the mass 
of a job j at the end of step t to be the sum, over all time 
t' £ [l,t] and over every machine i to which j is assigned 
at time t' , of Pij . Thus, for an arbitrary schedule, the mass 
of a job j at time t is a random variable. For an oblivious 
schedule Eo, the mass of j at the end of any step t is simply 

min{ 1}, 

where /t(-) is the assignment function ofSo at step t. We 
say that j accumulates that mass by step t. 

The following theorem is crucial for our approach to the 
scheduling problem. We emphasize that it holds for an arbi- 
trary SUU instance. It is used in the proofs of Theorem 13.11 
and Lemma [4.2l 

Theorem 2.2. Let T, be a schedule for an SUU instance, 
whose expected makespan is T . For any job j , in an ex- 
ecution of E for 2T steps, with probability at least 1/4, j 
accumulates a mass of at least 1/4. 

Proof. Let A be the event that j is finished within step 
2T. Let St be the random variable denoting the collection of 
machines assigned to job j at step t and P{St) = X^igs 
Let B be the event that X^i<t<2T ^('^'t) ^ 1/4- What we 
want to prove is Pr(_B'^) > 1/4. Observe that Pr(^) equals 
Pr{AnB)+PT{AnB''), which is at most Pr(^nB)-hPr(-B'=). 

We estimate the value of Pr(y4n-B) below. Observe that all 
possible executions of E on the jobs form an infinite rooted 
tree, in which each node represents an intermediate state 
during an execution (see Figure [l] for an illustration). Each 
node has an associated set of jobs, representing the unfin- 
ished jobs at that state. For a node A'^, let Jobs(A'^) be its 
associated set of unfinished jobs. Note that Jobs(-R) for the 
root node R at level consists of the entire set of jobs. The 
nodes at level k denote the states after k steps. From each 
node A'' at level k to each node Q at level k + 1, we can 
compute the corresponding transition probability according 
to the assignment function /jobs(jv),fe+i- 

Lemma 2.3. Consider a tree node N at level k, where 
j G Jobs(N). For 1 < t < k, let St be the machine set 
assigned to j during step t along the path leading to N from 
R. Assume that X^i<f<fc ^'('S't) < c, where c < 1. And let 
P{j, N) be the probability that j will be finished by level (step) 
2T following a tree path through N and X]i<t<2T 

P{St) < c. 

Then PU,N)<c-j:,^,^,P{St). 




A Markov chain for a resimen. 



An infinile execution Iree for a scliedulc. 



Figure 1: An illustration of the schedule. For simplic- 
ity purpose, we only use 3 jobs. Each node represents 
an intermediate state, with its associated set of unfin- 
ished jobs appearing inside. The number close to an 
edge represents its transition probability. The left graph 
is a Markov chain representation of a regimen. The right 
graph is a rooted tree representation of the execution of 
a schedule. To avoid cluttering, we only show the com- 
plete transitions for nodes {1,2} and {1} at step 2. 

Proof of Lemma: We prove the lemma by backward induc- 
tion on the level number k. Consider the base case: A'^'s 
level is 2T — 1. We only need to execute the schedule for 
one more step. Let S2T be the set of machines assigned to 
j during step 2T . If P(S-xr) > c — X]i<t<2T-i ^('^'t): then 
Piji = 0. Otherwise, the probability that j is finished 
within this step is at most P{S2t)- In either case, the claim 
is true. 

We now assume that the claim is true for any level k < 
2T — 1, our aim is to prove that the claim is also true for 
level k — 1. Consider a tree node A'' at level k — 1. Let Sk be 
the set of machines assigned to j during step k according to 
assignment function /jobB(JV),fc- A child node of A'^ at level k 
either does not contain j (j is finished at step k) or contains 
j {j is not finished at step k). Let the probabilities of the 
two cases be Pi and 1 — Pi, respectively. Denote all the 
children nodes where j is still unfinished as L. 

If P{Sk) > c - Ei<t<fc-i P{St), then P(i, N) = 0, which 
is < c - I]i<t<^_i P(S't). Otherwise, 

P{j,N) = Pi + ^P(j,Q) 

= Pi + (i-Pi)(c- ^(^0) 



l<t<fc 



where the second inequality follows from the induction hy- 
pothesis and the last inequality follows from the fact that 
Pi < P{Sk)- This proves the induction step and hence the 
Lemma. □ 

By invoking the lemma with c — 1/4, we obtain Pt{A n 
B) = P{j,R) < c = 1/4. Hence Pr(^) < 1/4 Pr(B'=). 
And by Markov's inequality, Pr(^) > 1/2. We conclude 
that Pr(B'') > 1/4, completing the proof. 

3. INDEPENDENT JOBS 



In this section, we study a special case of the schedul- 
ing problem, where the jobs are independent. We refer to 
this problem as SUU-I. To compute a solution to SUU-I, 
we first establish that there exists an oblivious schedule in 
which the total mass accumulated by the jobs in 0(r°^^) 
steps is Q{n). To find such a schedule, we formulate a sub- 
problem for maximizing the total sum of masses and then 
give polynomial-time algorithms to compute an O(logn)- 
approximate schedule and an 0(log^ n)-approximate oblivi- 
ous schedule for SUU-I. For oblivious schedules, we improve 
the approximation factor to 0(logn • log(min{n, m})) when 
we study the more general case with chain-like precedence 
constraints in il4.1l 



Theorem 3.1. If there exists a schedule E for SUU-I with 
expected makespan T, then there exists an oblivious schedule 
of length 2T, in which the total mass accumulated by all jobs 
is at least n/16. 

Proof. Consider an execution i5 of E for 2T steps. This 
execution yields naturally an oblivious schedule E_e of length 
2r, whose assignment functions ft{-)'s are defined as follows: 
ft{i) = j if machine i is assigned to job j at step t in E. 
Note that due to execution uncertainty, E, and hence E_b 
are both random variables. By Theorem 12. 2j for any job 
j, with probability at least 1/4, j accumulates a mass of 
at least 1/4 by step 2T in Eb. Thus, the expected mass 
of j at step 2T in E_b is at least 1/16. This implies that 
the expected total mass of all the jobs at step 2r in Eb is 
at least n/16. Therefore, there exists an oblivious schedule 
in which the total mass of the jobs at step 2T is at least 
n/16. □ 

3.1 An o(iog n)-approximate schedule for suu-i 

Motivated by Theorem l3.1l we formulate subproblem Max- 
SumMass for maximizing the sum of masses. In MaxSum- 
Mass, we are given a set J of n independent, unit-step jobs, 
a set M of m machines, and the probabilities Pij, and the 
goal is to find an assignment / : M ^ J U {_L} for a sin- 
gle step that maximizes the sum of masses over the jobs in 
the step. In Figure [2] we present a 1/3-approximation algo- 
rithm MSM-ALG for MaxSumMass (which can be shown to 
be NP-hard), and our approximation algorithm for SUU-I, 
which simply executes, in every step, MSM-ALG on the 
unfinished jobs. 

Theorem 3.2. MSM-ALG computes a 1/3-approximate 
solution to Problem MaxSumMass. □ 

Proof. Consider a bi-partite graph, where one side of the 
graph lie the nodes for jobs J and the other side lie the nodes 
for machines M. There is an edge {i,j) between machine i 
and job j for any Pij > 0. MSM-ALG can be viewed as 
picking and orienting the edges. Let Opt = {(*, j)} be the 
collection of edges of picked by the optimum assignment /* . 
Let Sol be the solution computed by MSM-ALG. We use 
a charging argument below. Consider any edge {i,j) G Opt. 

1. {i,j) £ Sol, charge Pij to itself. 

2. (i, j) ^ Sol: 

(a) {i,j) is not added because in step 2, f{i) 7^ nil. 
Let / = f{i). Charge pij to Pij' where {i,j') £ 
Sol. Notice that pij < Pij' , and Pij' will be 



Algorithm MSM-ALG 

INPUT: Jobs J, machines M, p^j's. 

• Set /(i) to nil, i G M. 

• For each pij in nonincreasing order: If /(i) is nil and 
Y.x:f{x)=jPxj +Pij < 1, assign i to j, i.e., f {i) ^ j. 

• For every unused machine i, f{i) ^-L; output /. 



Algorithm SUU-I-ALG 
INPUT: Jobs J, machines M, pij's. 

• Let St denote the set of unfinished jobs at the start of 
step t 

• In each step t, schedule according to the assignment de- 
termined by MSM-ALG applied to St and all machines. 



Figure 2: An approximation algorithm for scheduling independent jobs. 



charged at most once due to this situation be- 
cause each machine i in Opt is used at most once. 

(b) is not added because in step 2, f{i) = nil 

J2j::f(x)=jP^3 > 1- Since pij's are pro- 

cessed in decreasing order, we conclude that in 
Sol, X^^^^(^)^^ pxj > 1/2. Charge pij to 

Observe that one copy of Sol is sufficient to cover the charges 
of types 1 and 2(a). Two copies of Sol are sufficient to cover 
the charges of type 2(b) because, by definition, the mass of 
any job is at most 1 in any assignment. 

We conclude that MSM-ALG computes a solution with 
an approximation factor 1/3. □ 

Theorem 3.3. Algorithm SUU-I-ALG is an O(logn)- 
approximation algorithm for SUU-I. 

Proof. Let St denote the set of unfinished jobs at the 
start of step t. Then, by Theorem l3.1l there exists an obliv- 
ious schedule of length 2T°'"^ starting from step t, in which 
total mass of all jobs in St is at least I^tl/IB. By averaging 
over the 2T°^'^ time steps of this schedule, there exists an 
assignment of jobs to machines in step t such that the total 
mass of the jobs in St in step t is at least |S't|/(32r°^^). 
By Theorem EJ] in step t of SUU-I-ALG, the total mass 
of the jobs accumulated in step t is at least iS't|/(96r°^''^). 
By Proposition 12.11 it follows that the expected number of 
jobs that complete in step t is at least |S't|/(96er°^^). 

We thus have a sequence of random variables St which sat- 
isfy the property E[\St+i\ \St] = |5t|(l - l/(96er°^^)). By 
straightforward ChernofI bound arguments [3lll5|. we obtain 
that with high probability, St is empty within 0(r°'"^ log n) 
steps. □ 

3.2 An approximate oblivious schedule for s u U- 1 

The schedule computed by SUU-I-ALG is adaptive in 
the sense that the assignment function for each step is de- 
pendent on the set of unfinished jobs at the start of the 
step. Using an extension of MSM-ALG, we develop in this 
section a polynomial-time combinatorial algorithm to com- 
pute an oblivious schedule with expected makespan within 
an 0(log^ n) of the optimal. In §4.11 we improve this bound 
further to 0(logn • log(min{n, m})) using an LP-based al- 
gorithm. 

According to Theorem l3.1l there exists an oblivious sched- 
ule of length 2r°^^, in which total mass of all jobs is at 
least n/16. Intuitively, if one computes an obfivous sched- 
ule El of length 2r°^^ with the aim of maximizing the 
total sum of masses over the jobs, there should be many 
jobs accumulating constant masses in Si. One can then re- 
move those jobs and compute a second oblivious schedule 
E2 of length 2r°'°^ to maximize the total sum of masses for 



the remaining jobs, to remove some additional jobs which 
have accumulated constant masses. Since each computation 
of the oblivious schedule removes many jobs, this process 
should terminate quickly. By concatenating the Ei, E2, . . . 
together, one obtains an oblivious schedule E in which every 
job accumulates constant mass. 

By Theorem [32] we have a 1/3 approximation algorithm 
for Problem MaxS urn Mass. However, MaxS urn Mass only con- 
siders oblivious schedules of length 1, i.e., each machine is 
assigned to at most one job. What we need is a procedure of 
finding an oblivous schedule of length 2r°^^, which maxi- 
mizes the sum of masses over jobs. It turns out that one can 
extend MSM-ALG easily to take into account the schedule 
length, which can be arbitrary, and still obtain the same 
aproximation factor of 1/3. We now formalize our discus- 
sion. 

Problem (MaxSumMass-Ext): We are given a set J of 
n independent, unit-step jobs and a set M of m machines. 
Let Pij denote the probability that job j is successfully com- 
pleted if assigned to machine i. We are also given a param- 
eter t £ Z"*". The goal of the problem is to find an oblivious 
schedule Eo of length t such that the total sum of masses 
accumulated by the jobs by step t is maximized. 

We show below Algorithm MSM-E-ALG, which outputs 
an oblivious schedule Eo of length t G Z"*" that is a 1/3 
approximate solution to Problem MaxSumMass-Ext. Algo- 
rithm MSM-E-ALG is a simple modification from MSM-ALG 
as follows. Since the schedule is of length t, each machine 
can be assigned t times. We maintain a remaining capac- 
ity parameter for each machine, ti, initialized to the value 
t, to keep track of how many steps machine i is still avail- 
able to be assigned. We also use Xij to keep track of how 
many steps machines i is assigned to job j. In Step 2(a) 
of MSM-E-ALG, as long as ti is positive, assign i to j 
for as many steps as necessary. In Step 2(b), we update 
ti accordingly. In Step 3, we output an oblivious schedule 
Eo = {/t(-) : 1 < t < t}, which can be specified by Xij^s as 
follows. Let ji, . . . , j„ be an ordering of the jobs. /r(i) = jk 
for Y.i<i<k^'ji + 1 < r < J2i<i<k^iii and 1 < fe < n. 
Observe that the running time of MSM-E-ALG is inde- 
pendent of the value t because each pij, hence each pair 
{i,j), is processed exactly once in Step 2. It is not hard 
to see that MSM-E-ALG outputs a 1/3 approximate solu- 
tion to Problem MaxSumMass-Ext because similar analysis 
for MSM-ALG from Theorem [3T2I can be applied. 

Lemma 3.4. MSM-E-ALG computes a solution to Prob- 
lem MaxSumMass-Ext with an approximation factor 1/3. 

We now present an approximation algorithm SUU-I-OBL 
for Problem SUU-I. 

A few comments on SUU-I-OBL are in order. We use 
MSM-E-ALG repeatedly to accumulate constant masses 



Algorithm 1 MSM-E-ALG 



INPUT: Jobs J, machines M, Pi/s and t. 

1. Sort Pij's in decreasing order. Initialize: Vi,ti ^ t\ 
Mi,3,Xij ^ 0. 

2. For each Pij according to the order: 

(a) x.,^mm{u,[ '-^''-^i;^^-'^^ \}. 

(b) ti ^ ti — Xij. 

3. Output Eo specified by Xij^s. 

Algorithm 2 SUU-I-OBL 
INPUT: Jobs J, machines M, pi/s. 

1. t «- 1. 

2. 1^1. J. E ^ "empty schedule". 

3. While (|i?| > 0) and (1 < 66 log n) 

(a) Let E/ be the output of invoking MSM-E-ALG 
on R, M with the current t value. E <— E o E/. 

(b) Remove jobs that accumulate at least 1/96 mass 
from R. 

(c) J ^ / + 1. 

4. If \R\ > 0, then t ^ 2t, GOTO step 2; Otherwise, 
return E. 



for a good fraction of the jobs each round, until all jobs accu- 
mulate constant masses. There is still one obstacle though. 
Since we don't know the value of T°'"^ , we have to "guess" 
a value oft for MSM-E-ALG, which must be large enough, 
e.g., at least 2r°''^, to ensure that there exists an oblivi- 
ous schedule of length t in which the total mass is at least 
n/16, as proved in Theorem l3.1l In summary, in the loop of 
SUU-LOBL (Step 3), we repeatedly invoke MSM-E-ALG 
to accumulate 1/96 mass for the jobs, for at most 66 log n 
rounds (we will explain the reason shortly). At the end of the 
loop (Step 4), if there are some remaining jobs, that means 
our t value is not large enough, we hence double the value of 
t and try the new t again by resetting the other parameters. 
Note that during each invocation of MSM-E-ALG, we start 
from scratch by ignoring any mass that the jobs may have 
accumulated in the previous rounds. We now analyze the 
performance of SUU-I-OBL. 

lft> 2T°^'^, with one invocation of MSM-E-ALG using 
t, let X be the number of jobs that get at least 1/96 mass. 
The total sum of masses over the jobs is at most a; ■ 1 -I- (n — 
x) ■ 1/96 because the mass that any job accumulates is at 
most 1. Prom Theorem 13. Ij we know that there exists an 
oblivious schedule of length t, with a total sum of mass at 
least n/16. Now according to Lemma [331 MSM-E-ALG 
has an approximation ratio of 1/3. Thus, 

x-l + {n-x)- 1/96 > 1/3 • n/16. 

It follows that X > n/95. Since each invocation of 
MSM-E-ALG makes at least 1/95 of the jobs accumulate 
1/96 mass, it is sufficient to invoke MSM-E-ALG at most 



66 log n times until all jobs accumulate at least 1/96 mass. 

To prove that SUU-I-OBL terminates in polynomial time, 
we first bound the value of T"'"^ . Let Pmin = minij Pij. 
Obviously, if we let the jobs accumulate sufficient mass one 
by one by assigning all machines to a single job at any step, 
then every job accumulates a mass of at least 1 within a time 
interval of This implies that T"""^ = 0(^ logn). 

Since t is doubling every iteration in SUU-I-OBL, 0(log n+ 
log p ^ ) different t values will be "probed" before the algo- 
rithm terminates. With each t value, we invoke MSM-E-ALG 
at most 66 log n times, and each such invocation runs in 
polynomial time. We conclude that algorithm SUU-I-OBL 
terminates within time polynomial in the size of the input. 
We have thus proved: 

Lemma 3.5. For Problem SUU-\ , one can compute in poly- 
nomial time an oblivious schedule of length 0(logn)T"^''' in 
which every job accumulates a mass of at least 1/96. 

Theorem 3.6. For Problem SUU-\, within polynomial time, 
we can compute an oblivious schedule whose expected makespan 
is within a factor of 0(log^ n) of the optimal. 



Proof. Using Lemma 13.51 we first compute an oblivi- 
ous schedule Eo of length T = 0(log^ n) ■ T"^"^ in which 
every job accumulates a mass of at least 1/96. The infi- 
nite repetition of Eo, Ejf , is the oblivious schedule we want. 
Treating the execution of E^ during each step interval of 
[k ■ T + 1, (k + 1) ■ T], where A; = 0, 1, . . ., as one iteration, by 
Proposition 12. 1 1 we know that every job has a success proba- 
bility of at least ^ during each iteration. Within O(logn) 
iterations, all jobs are finished with high probability. Thus, 
the expected makespan of E^ is within 0(log^ n) of 7°""^. 
We now formalize this argument. 

Let random variable X be the iteration number when all 
jobs are finished. We bound the expected value of X below. 

oc 

E[X] = J2^r{X>i) 

362 1ogn— 1 oo 

J2 Pr{X >i)+ Yl Pr(X > i) 

i=0 2 = 362 log n 



< 



3621ogn.l+ J2 "-(l-Qfe^ 



i — 362 loe n 



1 1 

3621ogn-fn-(l-^)''"'°'^"-^(l 



96e' 



96e' 



< 362 log n -I- 



96e 



where the third inequality follows because every job has a 
probability of success within each iteration, and the last 
inequality follows by summing the geometric series and the 
fact that (1 - 9^^)^**^ < 1/2. This completes the proof of 
the theorem. □ 

4. JOBS WITH PRECEDENCE 
CONSTRAINTS 

In this section, we study SUU when there are non-trivial 
precedence constraints on the jobs. We first present in §4.1l a 
polylogarithmic approximation algorithm for the case when 
the constraints form disjoint chains, and then extend the 
results in §4.2l to the more general case when the constraints 



form directed forests. All of the schedules we compute are 
oblivious. 

4.1 Disjoint chains 

We consider SUU in the special case where the dependency 
graph C for the jobs is a collection of disjoint chains C — 
{Ci, • • ■ , Ci}. We refer to this problem as SUU-C. If job ji 
precedes j2 according to the constraints, we write ji -< j2. 

At a high level, our approach to solve SUU-C is to first 
compute an oblivious schedule of near-optimal length in 
which every job has a constant probability of successful com- 
pletion, then replicate this schedule sufficiently many times 
to conclude that all the jobs are finished with high probabil- 
ity within a desired makespan bound. We first consider the 
problem of accumulating a constant success probability for 
each job. As in the independent jobs case, we will use the no- 
tion of mass instead of the actual probability. However, we 
need to take into account the dependencies among the jobs. 
Therefore, we formulate the following problem AccuMass-C: 
Given the input for SUU-C, compute an oblivious schedule 
with minimum length T, subject to two conditions: (i) Ev- 
ery job j accumulates a mass of at least 1/2 within T; (ii) 
If ii ^ ia, ji must already accumulate mass 1/2 before any 
machine can be assigned to j2. Condition (ii) captures the 
intuition that if ji has a low probability of successful com- 
pletion before step t, then the probability that j2 is eligible 
for execution at step t would be small; so it does not make 
much sense to assign machines to 32 prior to t in the oblivi- 
ous schedule. 

The following is a relaxed linear program (LPl) for 
AccuMass-C. Let Xij denote the number of steps during which 
machine i are assigned to j. Let dj be the number of steps 
during which there is some machine assigned to j. 



(LPl) min t 

S.t. '^2,Pij^ij > 1/2 '^j & J 



< t \/i & M 



jeJ 

< Xij < dj 

dj > 1 Vj 



(1) 

(2) 

(3) 

(4) 
(5) 

Some comments on (LPl) are in order. Equation [T] enforces 
Condition (i). Equation[2]bounds the load on every machine, 
which we define below. Equation [3] bounds the time length 
on each chain constraint. Finally Equation |4] ensures that 
each job accumulates its mass during the dj steps when there 
is some machine assigned to it. Let T* be the optimal value 
for (LPl) above. 

Note that in (LPl) we do not have any condition to pre- 
vent two different jobs from two precedence chains to be 
scheduled on the same machine at the same step. We use the 
term pseudo-schedule to capture such "schedules", in which 
different jobs from different precedence chains may be sched- 
uled to the same machine simultaneously. 

Definition 4.1. A pseudo-schedule o/length T e Z+ 

U 00 is a collection of assignment functions, {ft ■ M — > 
2-' \l <t <T + 1}. 



map a machine to a set of jobs. In this sense, a pseudo- 
schedule may not be feasible; we address this issue later 
when describe how to transform a pseudo-schedule to an 
appropriate oblivious schedule. An oblivious schedule is a 
pseudo-schedule in which the value of ft is a single element. 



Given a pseudo-schedule 



Tig of (finite) 



Definition 4.2 
length T, {ft : M ^ 2'' \1 < t < T + 1}, the load of a 
machine i is defined as the total number of times that a 
job is scheduled on i in Tig. Formally, the load of machine 
* Ei<t<T-i-i !/<(*) I- ^'^^ load of Tg is defined as the 
maximum load of any machine. 

We remark that a pseudo-schedule of length T may have a 
load greater than T. 



Theorem 4.1. Within polynomial time one can round an 
optimal feasible solution to (LPl), and obtain a pseudo- 
schedule for Problem AccuMass-C whose length and load are 
both 0(logm)T*. 

Proof. Obviously (LPl) is feasible because one can as- 
sign machines to each job for a finite steps so that the job 
can accumulate a mass of 1/2. Let {xij,dj,t} be one opti- 
mal solution to (LPl). (Note that t is equal to T* .) Our 
efforts mainly concern the rounding procedure, i.e., obtain- 
ing a feasible integral solution from the fractional solution 
without blowing up t too much. We then describe how to 
get a pseudo-schedule from an integral solution to (LPl). 
We differentiate between two cases. 

The first case is when t > \J\ = n. We round each Xij and 
dj up by setting x*j — \xij] and d* — \dj~\. We obtain a 
feasible integral solution with approximation factor 2 since 
we have 



E 



p.jxij > 1/2 Vj e J, 



< i + 71 < 2i yi € M, 



E^ 

jeJ 

^ d* < t + n<2t Ck eC, 



< d* -iij. 



n. We make use of 



Hence, an assignment function of a pseudo-schedule may 



The second case is when t < \J\ 
some results from network fiow theory for our rounding 
in this case. Notice that although we target for a mass 
of 1/2, any constant smaller than 1/2 will do as well be- 
cause we can always scale every variable up to reach that 
target, sacrificing only a constant factor. In our presen- 
tation below, we use many such scale-up operations. (We 
haven't tried to optimize the constants.) For a given job 
Ji if J2ieM,Xij>iPi3^^3 - 1/4, we can round these Xij's to 
the next larger integer. Since [xij] < 2xij, this only in- 
curs a factor of 2 blow up in t. Thus, we only need to 
consider those jobs j such that "^ZieM x- ■>iP'j^'i — 1/4, 
which implies that J2ieM x -<iPii^'i — 1/4- Observe that 
J2^eM,Pi,<^,Xi,<lPij^ii < l/8> ""'hich implies 
^ieM,Pij>-^,Xij<iPi3^'i — 1/8- 

We bucket these pij's into at most B = [log(8m)] inter- 
vals (2"('=+i\2-'=] (fc = 0, 1,...). For a bucket 
b ■■ {2~^'+'\2-% if Ep,^6bucket6^«. < 1/32, we remove 
this bucket from further consideration. Note that the sum 
of Pi jXij over all removed buckets is at most 1/16. Hence 



for the Pij 's in the remaining buckets, we still have 

T,ieM,p,,>^,^„<iPiiXij > 1/16. 

For each job j, there is a bucket bj : (2"(''j+^\ 2"''^] such 

that Z/p .J gbucket bj - 163- Denote the sum on the left 
side of the above inequality by Dj . If necessary, we scale all 
the Xij 's (and other variables) up by a factor of 32, so that all 
Dj > 1. We then round Dj down to [-DjJ. These operations 
only cost us a constant factor in terms of approximation. 
Thus for the ease of the presentation below, we assume that 
the -Dj's are integral and let D — YljeJ ^i- 

We now construct a network-flow instance as follows (see 
Figure [S]). We have one node for each job j, one node for 
each machine i, a source node u, and a destination node 
V. We add an edge for each Xij contributing to the 

computation of Dj^s. We orient the edge (i, j) from j to i, 
with edge capacity [dj]. From each machine node i, add an 
edge toward v, with capacity \2t] . For each job node j, add 
an edge from u to j, with capacity Dj. 




Figure 3: A network flow instance for the rounding 
of an optimal solution to (LPl) 

The argument before the construction shows that a flow of 
demand D at u can be pushed through the network, where 
the Xij^s specify such a feasible flow. D is actually the max- 
imum flow of the network (consider the cut where one side 
consists of u alone). From Ford-Fulkerson's theorem [8] [5j. 
we know that there exists an integral feasible flow when the 
parameters are integral, as in our instance. We take such 
an integral flow value on edge (j, i) as our rounded solution 
xtj. Furthermore, the integral solution obtained observes 
the following identities. 



1 



iGM I o\ / 

J^^h < r2il VieM, 

J2 T'^^-l < rati a G c, 

x'j < \dj] 

Raising all the values by a factor of O(logm), we obtain an 
integral feasible solution {xij,dj , t}, where t = 0(log m)T* . 

We now describe how to construct from the integral solu- 
tion a pseudo-schedule E3 whose length and load are both 
bounded hy t = 0{\ogm)T* . Consider a job j in a chain 



5^jo Jo^i I/jQ. We assign the machines to j within a step 
interval of length Lj from step 'i/'j -I- 1 to ipj + Lj , using each 
machine i Xij times. In other words, the assignment func- 
tions for chain Ck are specified as follows. For any job j and 
machine i, if Xij > 0, ft{i) = {j} for t G [ipj + l,ipj + Xij]. 
This can be done because each machine is assigned to j at 
most Lj times and different machines can be assigned to j 
at the same step. After we define the ft{-) for every chain 
Ck G C, we define the assignment functions for Es as 

/t(j) = Ufc:CfcGc/*(i) for i€M,te [l,t\. 

Recall that the range of the assignment functions for a pseudo- 
schedule is a set of jobs. This completes the proof of the 
theorem. □ 

We now relate AccuMass-C to SUU-C. Recall that T* is the 
optimal value of (LPl) we write for Problem AccuMass-C, 
and T°^^ is the expected makespan of an optimum schedule 
E for Problem SUU-C. We now bound the value T* in terms 
of r°^'^ in Lemma 14.21 This lemma, together with Theo- 
rem 14.11 immediately yields a pseudo-schedule that solves 
AccuMass-C with load and length within O(logn) factor of 



Lemma 4.2. T* < 16T^ 



□ 



Proof. The following linear program is the same as (LPl), 
except that 1/2 is replaced by 1/16 and t is replaced by 
2yoPT argue that this linear program is feasible. 



E 



> 1/16 Vj G J 



Ckec 



< 
> 
> 



Consider the first 2T°^'^ execution steps using an opti- 
mal schedule E. Let random variable Xij be the number of 
steps in which i is assigned to j. Let random variable Yj 
be the total number of steps when there is some machine 
assigned to j. We know from Theorem 12.21 that with prob- 
ability at least 1/4, j accumulates at least 1/4 mass within 
2yopT g^gpg^ This amounts to the fact that the expected 
accumulated mass for j is at least 1/16. Thus 

^ p., • E[X,,] > 1/16. 

Since in E a machine is assigned to at most a job at any 
step, E.ej^.j < 2r°^^. So 



< 2T'^ 



Since we are considering only 2T°^^ steps of E, we have 



pectation, we have 



Obviously, Xij < Yj. Taking the ex- 



Cfc € C. Given the Xi 



let Li 



maxi Xi 



Let tpj = 



< 2T'' 



and 



E[X,A < E[Y,]. 



We conclude that Xij = E[Xij] for i € M,j £ J and 
dj = E[Yj] for j £ J form a solution to the linear program. 
Raising this solution by a factor of 8, we obtain a solution 
to (LPl). This means that a t of value 16T°^^ is achievable 
in (LPl). We have thus proved that T* < leT"""^ . This 
completes the proof of the lemma. □ 

Theorem 4.3. A pseudo-schedule with length and load 
bounded by O(logm) ■ T"'"'^ can be computed within poly- 
nomial time, such that: (i) Every job j accumulates at least 
1/2 mass, (ii) If j\ -< j2, j-z can only begin the accumulation 
after ji accumulates 1/2 mass. □ 

In the remainder of this section, we describe how to convert 
a pseudo-schedule obtained from Theorem 14.31 to a feasible 
schedule. According to Theorem 14.31 we can compute a 
pseudo-schedule Es of length 0(log m) ■ T'''^'^ in which every 
job accumulates a mass of at least 1/2, and hence a success 
probability of at least j^. Moreover, if ji -< j2, no machine 
is assigned to j2 until ji has accumulated 1/2 such mass. 
We now convert Eg to a (feasible) oblivious schedule Eo in 
two steps. 

1. We use the elegant random delay technique of [191 
I27| to delay the start step of the execution for each 
chain appropriately and obtain a new pseudo-schedule 
Es,i in which the number of jobs scheduled on any 
machine at any step is 0( , ), The random- 

^ log log(n + m; ^ 

ized schedule can also be derandomized using tech- 
niques from [22l EH [27]. We then "flatten" S^.i to 
obtain an oblivious schedule Eo,i, sacrificing a factor 
of Q do'gTirc^Ti) ) tlie schedule's length. 

2. To obtain the final oblivious schedule Eo, we take the 
oblivious schedule Eo,i from above and replicate each 
step's machine assignment 0(log n) times, so that all 
jobs will be finished with high probability. 

We now describe in detail the two steps that convert a 
pseudo-schedule to a feasible oblivious schedule. Since the 
second step is simpler, we describe it first. 
Schedule replication: We first replicate Eo,i at each 
step by a factor of a = 16 log n to get another oblivious 
schedule Eo,2. More precisely, let T denote Eo,i's length 
and let gt{-)^s be the assignment functions of T,o,i. We define 
the assignment functions /t(-)'s of Eo,2 as follows. For any 
t G [l,o--T], /t(-) = g.r{-), where r = [^^J -Hi. Note that if 
Eo,i can be specified in space polynomial in the size of the 
input, as we will show in the "delay" step, so can Eo,2. 

We define yet another oblivious schedule Eo,3 of length 
n as follows. Topologically sort the jobs according to the 
precedence constraints, e.g., appending the precedence chains 
one after another, and let ji, . . . ,jn be the jobs in the sorted 
order. The assignment functions /it(-)'s for Eo,3 are speci- 
fied as follows. Vi e M,ht{i) = jt, where 1 < t < n. Now 
the final oblivious schedule we want is Eo = Eo,2 ° E^a. In 
other words, oblivious schedule Eo is simply the replicated 
Eo,i followed by assigning all the machines to some job at 
each step. 

We now analyze the expected makespan of Eo. If all 
jobs are successfully completed within step crT, the expected 



makespan is at most aT. The probability that this does not 
happen is at most n{l — j^)'^ < l/n^ . Notice also that from 
step aT-\-l on, Eo assigns all the machines to a single job at 
each step periodically (due to Eo,3, with a period length of 
n). The expected number of steps for a job to be completed 
is at most T°^^ if all the machines are assigned to it. Since 
we periodically assign the machines to any fixed job, on av- 
erage, it takes at most {nT°'"^) steps to complete any fixed 
job. Hence, on average, it takes at most n^T°^^ steps to 
complete all the jobs using the assignment functions beyond 
step (tT. The expected makespan of Eo is thus at most 



(1 - l/n)a ■ r + 1/n ■ (cr 



.r + n'T°^^) 



log(n + m) 



As we will prove shortly, T = O (log m iog(n+m) ^ 
and a = 16 log n. We conclude that the expected makespan 



ofEo isO(lognlogm ,;;f4"+-i, ).r°-^ 

Converting pseudo-schedule Es to an oblivious sched- 
ule: We now address the issue when the computed pseudo- 
schedule Es from Theorem 14.31 is not yet feasible, that is, 
when some machine is assigned to more than one job at the 
same step. We claim that we can convert Es to an oblivious 
schedule Eo,i by sacrificing a factor of C'( Jg°fog"(t+i) )• 

Let Umax be the load of Es, i.e., the maximum number 
of jobs assigned to any machine. A result by Shmoys, Stein 
and Wein on job shop scheduling problem [27l Lemma 2.1] 
states that if we delay the starting step of each chain by an 
integral amount independently and uniformly chosen from 
[0,IImaa;], the resulting pseudo-schedule has no more than 
^( log'i^gCu+m) ) scheduled on any machine during any 
step. We now explain what we mean by the term delay. 
Recall that in the last paragraph of the proof for Theo- 
rem 14.11 we first specify a function ft for each constraint 
chain Ck £ C, and then define assignment function for Es 
as ft — ^kft- Suppose that a chain Ck is delayed by an 
amount of 4>k, the assignment function gt for chain Ck is 
modified as follows. Vi £ M, ii t < (j)k,gt(j') = 0; other- 
wise, gt{i) = ft-4,k(i)- And the assignment function for the 
schedule is defined as ft = Ufc(?^. To make our presentation 
self-contained, we now outline the argument for the bound 
of o( i°g('^+"') ) below. 

^ log log(n + m; ' 

Fix a step t and a machine i. Let p — Pr[at least r units 
of processing are scheduled on machine i at step t\. Note 
that a job j could be scheduled in multiple steps, and each 
job is unit-step, it is equivalent to say that there are mul- 
tiple processing units of job j. There are at most ('^""^^ 
ways to choose those r processing units. Focus on a par- 
ticular choice of r units. If these units are from different 
chains, the probability that they are all scheduled at step t 
is at most (jp^^ — Y since we choose the delay independently 
and uniformly from [0,IImaa:]. Otherwise, the probability is 
because our pseudo-schedule can never assign two units 
from the same chain to the same machine at the same step. 
Therefore, 



P < 

< 
< 



pTT \ ' / 1 



If - = ^ logTirttri) . then p < (n + m)-(-^\ Let L™.. 
be the length of the longest chain according to Ss. The 
probability that any machine at any step is assigned at 
'^^st a io'g°f4"(n+m) is bounded by m(IImax + -^'maa:)(n + 
m)~'■°'~^\ With the assumption, which we will remove 
shortly, that T'''^'^ is bounded by a polynomial in (n + m), 
Hmaa; + Lmax is bounded by a polynomial in (n + m) as 
well. If we choose a to be sufficiently large, then with high 
probability, no more than a , '°g("+'") jobs are scheduled 
on any machine at any step. 

Shmoys, Stein and Wein [27J also derandomize the algo- 
rithm so that 0(log(n + m)) jobs can be scheduled on any 
machine simultaneously, based on results by [23] [24] [22] . 
Schmdit, Siegel and Srinivasan |25) give a different deran- 
domization strategy and obtain a collision bound match- 
ing the randomized algorithm, i.e., O( iog°fi^(n+m) ) machines 
simultaneously for any machine. We denote this (deran- 
domized) pseudo-schedule by Ss,i, whose length is at most 
twice that of Es. According to Theorem 14.31 S^'s length 
is O(logm) • r°^'^, it follows that we can "flatten" E^.i 
out to obtain an oblivious schedule Eo,i whose length is 
0(log mj;^jS^^±^I_) . yopT^ which each machine is as- 
signed to one job at any step. We comment that the random 
delay technique originates in [T9j when they study the job 
shop scheduling problem. 

Reducing T°^^: We now address the issue that T°^^ is 
not always bounded by a polynomial in (n -I- m). We make 
use of a trick from [27[ Section 3.1]. Consider the pseudo- 
schedule Ss computed in Theorem 14.31 For each job let 
hj be the number of steps in which machine i is assigned to 
j and Lj be maxi Uj. Denote maxj Lj by L. We know that 
all machines are assigned to j within a window of length 
Lj. Let P — nm. Round each Uj down to the nearest mul- 
tiple of ^, and denote this value by I'ij. We therefore can 
treat the I'ij as integers in {0, . . . A schedule for this 

new problem can be trivially rescaled to one with the real 
values I'ij. Since (3 = nm, the schedule now effectively has 
a length (and load) bounded by a polynomial in {n + m). 
Hence our discussions of the random delay and derandom- 
ization hold now. Let S' be the resulting feasible oblivious 
schedule, with length bounded by 0(log m , j°fjj'+"^j )T'^^'^ 
and load bounded by 0(log m)T°^^. To get a feasible obliv- 
ious schedule Eo,i so that every job accumulates 1/2 mass, 
we insert {Uj — I'ij) units of processing to S'. The insertion 
can be done in a way that preserves the precedence con- 
straints, i.e., if ji -< j2, then no machine can be assigned 
to j2 before ji accumulates 1/2 mass. Since each insertion 
lengthens S' by an amount < — and we have at most nm 
such insertions, the length of the schedule is increased by at 
most L. The loads on the machines are the same as before 
the rounding. Note that L is bounded by Ilmaa;, which is 
0(logm)r°^^. We thus have obtained a feasible oblivious 
schedule whose length is 0(log m f4"(tri) )^°'"^ ' 
which every job accumulates a constant mass. Finally, we 
use the replication technique discussed earlier in this section 
to obtain the desired schedule. 

Theorem 4.4. For Problem SUU-C, there exists a poly- 
nomial-time algorithm to compute an oblivious schedule sched- 
ule with expected makespan within a factor of 
0(log m log w ,j°fjg+"^) ) of the optimal. □ 



For independent jobs, i.e., when the constraints C in Prob- 
lem SUU-C is empty, we can prove a bound for oblivious 
schedules that slightly improves over the result stated at 
the end of P 

Theorem 4.5. For Problem SUU-I, there exists a poly- 
nomial-time algorithm to compute an oblivious schedule sched- 
ule with expected makespan within a factor of 
0(log n • log(min{n, m})) of the optimal. □ 

Proof. Let (LP2) be the linear program obtained from 
(LPl) by removing constraints [3] [1 [5] and be (LP2)'s 
optimal value. We first show that one can round an optimal 
feasible solution to (LP2), and obtain an oblivious schedule 
for Problem AccuMass-C, whose length, and hence load, are 
both 0(log(min{n, m})) • Tj*. 

For Problem SUU-I, Condition (ii) of AccuMass-C is void. 
We thus don't need constraints [3] [4] [5] when writing the 
linear program. The rounding in the proof of Theorem 14.11 
gives an 0(log m) blow-up. If m > n, we can do a better 
analysis for the rounding procedure. Since there are n -\- m 
non-trivial constraints in (LP2), there are at most n -\- m 
nonzero values in any basic feasible solution [2] 126] . In an 
optimal solution {xij,t} (which is basic feasible), we may 
assume without loss of generality that for any machine i, 
there exists a j such that Xij > 0. Otherwise, we may 
remove that machine from consideration in (LP2). From 
here, we conclude that the number of machines i that have 
at least two Xij > is at most n. When we round Xij^s, 
we only need to consider these machines i with at least two 
Xij > 0. Then the same rounding procedure in the proof 
of Theorem [ITT] gives a factor O(logn) blow-up because for 
each job, we only need to consider O(logn) buckets. 

We conclude that one can obtain an integral feasible solu- 
tion {xij,t} where i — 0(log(min{n, m}))-T2 . Furthermore, 
from {xij,t}, one can construct a (feasible) oblivious sched- 
ule for Problem AccuMass-C, whose length, and hence load, 
are t = C'(log(min{n, m})) ■ T2 . This is because the load on 
each machine is bounded by i according to Equation [2] and 
the jobs are independent. Hence the machine assignment 
can be done in such a way that no more than one job is 
scheduled on any machine at any step. 

We thus have an oblivious schedule in which every job 
accumulates a constant mass within time that is at most 
0(log(min{n, m}) times optimal. We now apply the sched- 
ule replication step and obtain the desired bound. □ 

4.2 Tree-like precedence constraints 

Our algorithm for tree-like precedence constraints uses 
techniques from [17) . who extend the work of |27) on schedul- 
ing unrelated parallel machines with chain precedence con- 
straints to the case where there are tree-like precedence con- 
straints by decomposing the directed forests into O(logn) 
collection of chains. To state their result, we first introduce 
some notations used in [17| . Given a dag G{V, E), let din(u) 
and dout{u) denote the in-degree and out-degree, respec- 
tively, of « in G. A chain decomposition of G is a partition 
of its vertex set into subsets B\, . . . , B\ (called blocks) such 
that: (i) The subgraph induced by each block Bi is a collec- 
tion of vertex-disjoint directed chains; (ii) For any u,v aV , 
let M G -Bi be an ancestor of d G Bj. Then, either i < j, or 
i = j and u and v belong to the same directed chain of Bi; 
(iii) If dout{u) > 1, then none of it's out-neighbors are in the 
same blocks as u. The chain-width of a dag is the minimum 



value A such that there is a chain decomposition of the dag 
into A blocks. We now state the decomposition result. 

Lemma 4.6 ([17|, Lemma 1). Every dag whose under- 
lying undirected graph is a forest has a chain decomposition 
of width 7, where 7 < 2([logn] +1). The decomposition can 
be computed within polynomial time. 

Using Lemma 14.61 we simply decompose a given directed 
forest into at most 7 = 0(log n) blocks, and within each 
block, apply our algorithm for the chain case (Theorem l4.4p . 
Since the optimal expected makespan on any subgraph (sub- 
set of jobs) is a lower bound for that of the whole graph 
(whole set of jobs), this approach gives up another factor of 
log n. We have thus obtained 

Theorem 4.7. For Problem SUU , if the dependency graph 
C is a directed forest, there exists a polynomial-time algo- 
rithm to compute an oblivious schedule schedule with ex- 
pected makespan within a factor o/O (log m log^ n i^°f}gi^^^„^ ) 
of the optimal. 

When the precedence constraints form a collection of out 
trees (rooted trees with edges directed away from the root) 
or in trees (defined analogously), we can obtain an improved 
approximation algorithm by again following the ideas of [17|. 
More specifically, we decompose the out/in trees into 0(log n) 
blocks; then randomly delay each chain by an amount of 
steps chosen uniformly from [0,O{Tlmax/ ^ogn)] (this step 
can be derandomized in polynomial time); and prove that 
with high probability, at most O(logn) jobs can be sched- 
uled on any machine simultaneously. 

Theorem 4.8. For Problem SUU, if the dependency graph 
C is a collection of out/in trees, there exists a polynomial- 
time algorithm to compute an oblivious schedule schedule 
with expected makespan within a factor 0/ 0(log m log^ n) 
of the optimal. 

5. OPEN PROBLEMS 

In this paper, we have presented polylogarithmic approxi- 
mation algorithms for the problem of multiprocessor schedul- 
ing under uncertainty, for special classes of dependency graphs 
We believe that our bounds are not tight; in particular, we 
conjecture that a more careful analysis will improve the ap- 
proximation ratios by an 0(log n) factor in each case. It will 
also be interesting to obtain approximations for more gen- 
eral classes of dependencies, and to consider online versions 
of our scheduling problem. 
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